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An Introduction to Rocker: Docker 


Containers for R 
by Carl Boettiger, Dirk Eddelbuettel 


Abstract We describe the Rocker project, which provides a widely-used suite of Docker images with 
customized R environments for particular tasks. We discuss how this suite is organized, and how these 
tools can increase portability, scaling, reproducibility, and convenience of R users and developers. 


Introduction 


The Rocker project was launched in October 2014 as a collaboration between the authors to provide 
high-quality Docker images containing the R environment (Boettiger and Eddelbuettel, 2014). Since 
that time, the project has seen both considerable uptake in the community and substantial development 
and evolution. Here we seek to document the project’s objectives and uses. 


What is Docker? 


Docker is a popular open-source tool to create, distribute, deploy, and run software applications 
using containers. Containers provide a virtual environment (see Clark et al. (2014) for an overview of 
common virtual environments) requiring all operating-system components an application needs to 
run. Docker containers are lightweight as they share the operating system kernel, start instantly using 
a layered filesystem which minimizes disk footprint and download time, are built on open standards 
that run on all major platforms (Linux, Mac, Windows), and provide an added layer of security by 
running an application in an isolated environment (Docker, 2015). Familiarity with a few key terms is 
helpful in understanding this paper. The term “container” refers to an isolated software environment 
on a computer. R users can think of running a container as analogous to loading an R package; a 
container is an active instance of a static Docker image. A Docker “image” is a binary archive of that 
software, analogous to an R binary package: a given version is downloaded only once, and can then 
be “run” to create a container whenever it is needed. A “Dockerfile” is a recipe, the source-code, to 
create a Docker image. Pre-built Docker images are publicly available through Docker Hub, which 
plays a role for central distribution similar to CRAN in our analogy. Development and contributions 
to the Rocker project focus on the construction, organization and maintenance of these Dockerfiles. 


Design principles and use cases 


Docker gives users very convenient access to pre-configured and pre-built binary images that “just 
work”. This allows R users to access a wider-variety of ready-to-use environments than provided by 
either the R Project itself or, say, their distribution which will generally focus on one (current) release. 
For example, R users on Windows may run RStudio® Server or Shiny® Server locally just by launching 
a single command (once Docker itself is installed). Another common use-case is access to R-devel 
without affecting the local system. Here, we detail some of the principal use cases motivating these 
containerized versions of R environments, and the design principles that help make them work. 


Portability: From laptop to cloud 


One common use case for Rocker containers is to provide a fast and reliable mechanism to deploy a 
custom R environment to a remote server, such as Amazon Web Services Elastic Compute (AWS EC2), 
DigitalOcean, NSF’s Jetstream servers (Stewart et al., 2015), or private or institutional server hardware. 
Rocker containers are also easy to run locally on most modern laptops using Windows, MacOS, or 
Linux-based operating systems. By sharing volumes with the local host, users can still manipulate 
files with familiar, native tools while performing computation through a reproducible, containerized 
environhment (Boettiger, 2015). Being able to test code in a predictable, pre-configured R environment 
on a local machine and to then run the same code in an identical environment on a remote server (¢.¢., 
for access to greater RAM, more processors, or merely to free up the local machine from a long-running 
computation) is essential for low-friction scaling of analysis. Without such containerization, getting 
code to run appropriately in a remote environment can be a major undertaking, requiring both time 
and knowledge many would-be users may not have. 
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For instance, on any platform with Docker installed, the following Docker command will launch a 
Rocker container providing the RStudio® server environment over a web interface. 


wget -qO- https://get.docker.com/ | sh 
sudo docker run -p 8787:8787 -e PASSWORD=<PICK-A-PASSWORD> rocker/rstudio 


The docker run option -p sets the port on which RStudio® will appear, for which 8787 is the 
default (adding your user to the docker group avoids the need for a sudo command to call docker: 
sudo usermod -g docker $USER). Many academic and commercial cloud providers make it possible 
to execute such code snippets when a container is launched, without ever needing to ssh into the 
machine. The user may log into the server merely by pasting its IP address or DNS name (followed 
by the chosen port, e.g., :8787) into a browser and entering the appropriate password. This provides 
the user with a familiar, interactive environment running on a remote machine while requiring a 
minimum of expertise. 


This portability is also valuable in an instructional context. Requiring students to install all 
necessary software on personal laptops can be particularly challenging for short workshops, where 
download and installation time and troubleshooting across heterogeneous machines can prove time 
consuming and frustrating for students and instructors alike. By deploying a Rocker image or Rocker- 
derived image (see Extensibility) on a cloud machine, an instructor can easily provide all students 
access to the pre-configured software environment using only the browser on their laptops. This 
strategy has proven effective in our own experience in both workshops and semester-length courses. 
Similar Docker-based cloud deployments have been scaled to courses of 100s of students, e.g., at Duke 
(Cetinkaya-Rundel and Rundel, 2017) and UC Berkeley (UC Berkeley, 2017). 


HPC application 


The portability of Rocker images can be particularly valuable in High Performance Computing contexts 
Setting up a specific R environment on High Performance Computing platforms and other centrally 
administrated multi-user machines or clusters has traditionally been challenging due to restrictions on 
root access that may be needed to install certain libraries. Versions of R and packages installed by the 
system administrator may also lag behind the most recent releases. Deploying Docker containers on 
HPC systems has previously been more very problematic since most system administrators do not 
want to allow the elevated user permissions the Docker runtime environment requires. To work around 
this problem, Lawrence Berkeley National Labs (LBNL) has made ‘Singularity’ (Lawrence Berkeley 
National Laboratories, 2017): a container runtime environment that users can both install and use to 
run most Docker containers without requiring root privileges. Singularity has seen rapid adoption 
in the HPC community (http: //singularity.1bl.gov/install-request). Rocker containers can be 
run through Singularity with a single command much like the native Docker commands, e.g. 


singularity exec docker://rocker/tidyverse: latest R 


More details can be found in the Singularity documentation. 


Interfaces 


An important aspect of the Rocker project design is the ability for users to interact with the software 
on the container through either an interactive shell session (such as the R shell or a bash shell), or 
through a web browser accessing the RStudio® Server integrated development environment (IDE). 
Traditional remote and high-performance computing workflows for R users have usually required the 
use of ssh and a terminal-only interface, posing a challenge for interactive graphics and a barrier to 
users unfamiliar with these tools and environments. Accessing an RStudio® container through the 
browser removes these barriers. Rocker images include the RStudio-server software pre-installed and 
configured with the explicit permission of RStudio® Inc. 


Users can access a bash shell running as root within a Rocker container using 
docker exec -ti <container-id> bash 


which can be useful for administrative tasks such as installing system dependencies. All Rocker 
images can also be run as an interactive R, RScript or bash shell without running RStudio, which can 
be useful for batch jobs or for anyone who prefers that environment. 


As with any interactive Docker container, users should specify the terminal (-t) and interactive 
(-i) flags, (here combined with interactive as -ti), and specify the desired executable environment 
(e.g., R, though other common options could be Rscript or bash): 
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docker run --rm -ti rocker/tidyverse R 


This example shows the use of the --rm flag to indicate that the container should be removed 
when the interactive session is finished. Details on sharing volumes, managing user permissions, and 
more can be found on the Rocker website, https: //rocker-project.org. 


Sandboxed 


Another feature of Rocker containers is the ability to provide a sandboxed environment, isolated from 
software and potentially from other data on the machine. Many users are reluctant to upgrade their 
suite of installed packages, which may break their existing code or even their R environment if the 
installation goes poorly. However, upgrading packages and/or the R environment is often necessary 
to run analyses from a colleague, or access more recent methods. Rocker offers an easy solution. For 
instance, a user can run R code requiring the most recent versions of R and related packages inside a 
Rocker container without having to upgrade their local installations first. Conversely, one could use 
Rocker to run code on an older R release with prior versions of R packages, again without having to 
make any alteration to one’s local R install. Another common use case is to access a container with 
support for particular options such as using gcc or clang compiler sanitizers (Eddelbuettel, 2014). 
These require R itself be built with specialized settings that may not be not available or familiar 
to many R users on their native system, but can be easily deployed by pulling the Rocker images 
rocker/r-devel-san or rocker/r-devel-ubsan-clang. 


This sandboxing feature is also valuable in the remote computing context, allowing system 
administrators to grant users freedom to install software which requires root privileges inside a 
container, while not granting them root access on the host machine. Root access is required to launch 
Docker containers, though not to access containers already running and providing some service such 
as RStudio. Users logging into a container through the RStudio® interface do not by default have root 
privileges, though are able to install R packages. Granting these users root privileges in the container 
still leaves them sandboxed from the host container. Sandboxing also serves an important function in 
reproducible research by making it easier to test a specified environment in isolation from the host 
machine. Unlike traditional virtual machines, containers do not impose a large footprint of reserved 
resources as a typical host can easily support 100s of containers (Docker, 2015). 


Transparent 


Users can easily determine the software stack installed on any Rocker image by examining the 
associated Dockerfile recipe, which provides a concise, human-readable record of the installation. All 
Rocker images use automated builds through Docker Hub, which also acts as the central, default 
repository distributing the images. Using automated builds rather than uploading pre-built image 
binaries to Docker Hub avoids the potential for the build not to match the recipe. The corresponding 
Dockerfile is visible both on the Docker Hub and in the linked GitHub repository, which provides 
a transparent versioned history of all changes made to these recipes, as well as documentation, a 
community wiki, and issue trackers for discussing proposed changes, bugs, improvements to the 
Dockerfiles and troubleshoot any issues users may encounter. Having these public source files built 
automatically by a trusted provider (Docker Hub), rather than built locally and uploaded as binaries, 
is also useful from a security perspective in avoiding malware. 


Community optimized 


Having a shared, transparent computational environment created by a publicly hosted, reproducible 
recipe facilitates community input into configuration details. R and many of its packages and related 
software can be configured with a wide range of options, compilers, different linear-algebra libraries 
and so forth. While this flexibility reflects varying needs, many users rely on default settings which 
are most often are optimized more for simplicity of installation rather than than performance. The 
Rocker recipes reflect significant community input on these choices. This helps create a more finely 
tuned, optimized reference implementation of the R environment as well as a platform for comparing 
and discussing these concerns which are often overlooked elsewhere. Issues and Pull Requests on the 
Rocker repositories on GitHub attest to some of these discussions and improvements. In particular, 
input from the Docker Inc. employees through the official approval process for the r-base image, 
expertise from the Debian R maintainer and other Debian developers, and both direct and indirect 
feedback from the experience and user-generated documentation from many early adopters in the R 
community has helped shape and strengthen the project over the past few years. Widespread use of 
the Rocker image helps promote both testing of these choices and contributions, further tweaking the 
configuration from many members of the R community. 
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Versioned 


Access to specific versions of software can be important for users who need computational repro- 
ducibility more than having the latest release of any piece of software, since subsequent releases can 
alter the behavior of code, introduce errors or otherwise alter previous results. The versioned stack 
(r-ver, rstudio, tidyverse, verse, and geospatial) provides images which are intended to build an 
identical software stack every time, regardless of the release of new libraries and packages. Users 
should specify an R version tag in the Docker image name to request a version stable image, e.g., 
rocker/verse: 3.4.0. If no tag is explicitly requested, Docker will provide the image with the tag 
: latest, which will always have the latest available versions of the software (built nightly). 


Users building on the version-tagged images will by default use the MRAN snapshot mirror 
(Revolution Analytics, 2017) associated with the most recent date for which that image was current. 
This ensures that a Dockerfile building FROM rocker/verse:3.4.1 will only install R package versions 
that were available on CRAN on 2017-06-30, i.e., the day R 3.4.1 was released. This default can of 
course be overwritten in the standard R manner, e.g., by specifying a different CRAN mirror explicitly 
in any command to install packages, e.g., install .packages(), or by adjusting the default CRAN 
mirror in options (repo=<CRAN-MIRROR>) in an .Rprofile. Note that the MRAN date associated with 
the current release (e.g., 3.4.2 at the time of writing) will continue to advance on the Docker-hub 
image until the next R release. Software installed from apt-get in these images will come from the 
the stable Debian release (stretch or jessie) and thus not change versions (though it will receive 
security patches). Packages installed from BioConductor using the bioclite() utility will also install 
the version appropriate to the version of R found on the system (the Bioconductor semi-annual release 
model avoids the need for an MRAN mirror). Users installing packages from GitHub or other sources 
can request a specific git release tag or hash for a more reproducible build, or adopt an alternative 
approach such as packrat (Ushey et al., 2016). A more general discussion of the use and limitations of 
Docker for computational reproducibility can be found in Boettiger (2015). 


Extensible 


Any portable computational environment faces an inevitable tension between the “kitchen sink 
problem” at one extreme, and the “discovery problem” on the other. A kitchen sink image seeks 
to accommodate too many use cases in a single image. Such images are inevitably very large and 
thus slow or difficult to deploy, maintain and optimize. At the other extreme, providing too many 
specialized images makes it more difficult for a user to discover the one they need. The Rocker project 
seeks to avoid both of these problems by providing a carefully-curated suite of images that an be easily 
extended by individuals and communities. 


To make extensions transparent and persistent, Rocker images can be extended by any user 
by writing their own Dockerfiles based on an appropriate Rocker image. The Dockerfiles in the 
Rocker stack should themselves provide a simple example of this, (as described in the tollowing 
section). A user begins by selecting an appropriate base image for their needs: if the RStudio 
interface is desired, a user might start with FROM rocker/rstudio; an image for testing an R package 
with compiled code might use FROM rocker/r-devel-san, and an image for reproducing a data 
analysis will probably select a stable version tag in addition to an appropriate base library, e.g.,: FROM 
rocker/tidyverse: 3.4.1. Users can easily add additional software to any running Rocker image 
using the standard R and Debian mechanisms. Details on how to extend Rocker images can be found 
at https: //rocker-project.org. 


Sharing these Dockerfiles can also facilitate the emergence of extensions tuned to particular 
communities. For instance, the rocker/geospatial image emerged from the input of a number of 
Rocker users all adding common geospatial libraries and packages on top of the existing Rocker 
images. This coalescence helped create a more fine-tuned image with broad support for a wide 
range of commonly-used data formats and libraries. Other community images are developed and 
maintained independently of the Rocker project, such as the popgen image of population-genetics- 
oriented software developed by the National Evolutionary Synthesis Center (NESCent). Rocker images 
are also being used as base Docker images in the NSF sponsored Whole Tale project for reproducible 
computing (Ludaescher et al., 2017), and are heavily used by the rhub project in automated package 
testing (Csardi, 2017). 


Rocker organization and workflow 


The Rocker project consists of a suite of images built automatically by and hosted on the Docker 
Hub, https: //hub.docker.com/r/rocker. Source Dockerfiles, supporting scripts and documentation 
are hosted on GitHub under the organization rocker-org, https: //github.com/rocker-org. The 
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issue tracker and pull requests are used for community input, discussions, and contributions to these 
images. The Rocker project wiki, https: //github.com/rocker-org/rocker/wiki, provides a place to 
synthesize community-contributed documentation, use-cases, and other knowledge about using the 
Rocker images. 


Images in the Rocker Project 


The Rocker project aims to provide a small core of Docker images that serve as convenient ‘base’ 
images on which other users can build custom R environments by writing their own Dockerfiles, 
while also providing a ‘batteries included’ approach to images that can be used out of the box. The 
challenges of balancing diverse needs driven by very different use cases against the overarching goals 
of creating images that are still sufficiently light-weight, easy to use, and easy to maintain is a difficult 
art. The implementation in both individual Rocker images and image stacks can never perfect that 
balance for everyone, but today reflects the considerable community input and testing over the past 
few years. 


All Rocker images are based on the Debian Linux distribution. It provides a small base image, 
the well-known apt package management system, and a rich ecosystem of software libraries, making 
it the base image of choice for Docker images, including many of the “official” images maintained 
by Docker’s own development team. The Debian platform is also perhaps the best-supported Linux 
platform within the R community, including an active r-sig-debian listserve. The relatively long 
period between stable Debian releases (roughly two years recently) means that software in the Debian 
stable (e.g., debian: jessie, debian: stretch) releases can lag significantly behind current releases 
of popular software, including R. More recent versions of packages can be found in the pre-release 
distribution, debian: testing, while the very latest binary builds can be found on debian: unstable. 
The Rocker project can be largely divided into two stacks which address different needs, reflected in 
which Debian distribution they are based on. The first stack is based on debian: testing. The second, 
more recently-introduced stack, is based only on Debian stable releases. Rocker images always point 
to specific stable releases (jessie, stretch), and do not use the tag debian: stable, which is a rolling 
tag that always points to the most recent stable version. The different Rocker stacks have different 
aims and thus provide different images, as shown in Tables 1 & 2 below. 


The debian: testing-based images 


The debian: testing stack aims to make the most efficient use of upstream builds: the pre-compiled 
. deb binaries provided by the Debian repositories. It is both quicker and easier to install software from 
binaries, since the package manager (apt) manages the necessary (binary) dependencies and bypasses 
the time-consuming process of compiling from source. Basing this stack on debian: testing means 
that much more recent versions of commonly-used libraries and compilers are available as binaries 
than would be found in a Debian stable release. In order to provide optional access to the most recent 
available binaries, this stack uses apt-pinning (Debian Project, 2017) to allow the apt package manager 
to selectively install binaries from debian: unstable, which represents the most recent set of packages 
built for Debian. Similarly, recent versions of many popular R packages can also be installed pre-built 
through the package manager, e.g., apt-get install r-cran-xml. This can be particularly helpful 
for packages with external system dependencies (such as 1ibxm12-dev in this example) which cannot 
be installed from the R console as they are system dependencies rather than R packages installed 
from within R. We should note, however, that only about 500 of the over 11,000 CRAN packages are 
available as Debian packages. 


As the names testing and unstable imply, particular versions of package can change as packages 
move from unstable into testing. New versions are sent to unstable during the normal course of 
Debian development. This can occasionally break a previously-working installation command ina 
Dockerfile until the maintainer redirects the package manager to install a package from the unstable 
sources that could previously be installed from testing, or vice versa (using the -t option in apt). 
That said, packages only migrate from unstable to testing after a period of several days—and if the 
migration and installation of the particular version is free of interactions with other packages in their 
dependency graph. That way, unstable serves as validation lab which leaves testing reasonably 
stable yet current. 


Relative to stable, the testing stack thus offers some advantages as almost all software can 
be installed through the package manager. Installation of binary packages from testing generally 
provides the most recent available software, and installs it quickly as a binary. On the other hand, these 
Dockerfiles may require occasional maintenance when packages migrate and/or versions change. 
The resulting images are also inherently dynamic: rebuilding the same Dockerfile months or years 
apart will generate images with significantly different versions of software installed as the pool of 
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underlying packages changes through time. 


Images overview 


The debian: testing-based stack currently includes seven images actively maintained by the Rocker 
development team (Table 1). r-base builds on debian: testing, and the other six in the stack each 
build directly from r-base. The r-base image is unique in that it is designated as the official image for 
the R language by the Docker organization itself. This official image is reviewed and then built by 
employees of Docker Inc. based on a Dockerfile maintained by the Rocker team. Consequently, users 
should refer to this image in Docker commands without an organization namespace, e.g., docker run 
-ti r-base to access the official image. All other images in the Rocker project are not individually 
reviewed and built by Docker Inc. and must be referenced using the rocker namespace, e.g., docker 
run -ti rocker/r-devel. 


Several of the images in this stack are oriented towards the R development community: r-devel, 
drd, r-devel-san, and r-devel-ubsan-clang which all add a copy of the development version of 
R side-by-side to the current release of R provided by r-base. On these images, the development 
version is aliased to RD to distinguish from the current release, R. As the names suggest, each provide 
slightly different configurations. Of particular interest are the images providing development R built 
with support for C/C++ address and undefined-behavior sanitizers, which are somewhat difficult to 
configure (Eddelbuettel, 2014). 

As these images focus on developers and/or as base images for custom uses, this stack does not 
include many specific R packages. Additional dependencies and packages can easily be installed from 
apt. R packages not available in the apt repositories can be installed directly from CRAN using either 
R or the littler scripts, as described in https: //rocker-project.org/use. 


This stack also includes the images shiny and rstudio: testing that provide Shiny server and 
RStudio® server IDE from RStudio® Inc, built on the r-base image. RStudio® and Shiny are registered 
trademarks of RStudio Inc, and their use and the distribution of their software in binary form on 
Docker Hub has been granted to the Rocker project by explicit permission from RStudio. Users should 
review RStudio®’s trademark use policy (http://www. rstudio. com/about/trademark/) and address 
inquiries about further distribution or other questions to permissions@rstudio.com. The Rocker 
project also provides images with RStudio® server and Shiny server in the stable versioned stack. 


Build schedule: The official r-base image is rebuilt by Docker following any updates to the official 
debian images (roughly every few weeks). The rest of the stack uses build triggers that rebuild the 
images whenever r-base is updated or the Dockerfile sources are updated on the corresponding 
GitHub repository. The only exception in this stack is the drd image, which is rebuilt each week by a 
cron trigger. 


Table 1: The debian: testing image stack 


image description size downloads 
r-base official image with current version of R 254MB 632,000 
r-devel R-devel added side-by-side to r-base (using alias 1 GB 4,000 

RD) 
drd lightweight r-devel, built weekly 571 MB 4,000 
r-devel-san as r-devel, but built with compiler sanitizers 1.1 GB 1,000 
r-devel-ubsan-clang sanitizers, clang c compiler (instead of gcc) 1.1 GB 525 
rstudio:testing rstudio on debian:testing 1.1 GB 1,000 
shiny shiny-server on r-base 409 MB 123,000 


Table 2: The rocker-versioned stack of images 


image description size downloads 
r-ver version-stable base R & src build tools 219MB 6,000 
rstudio adds rstudio 334MB 314,000 
tidyverse adds tidyverse & devtools 656 MB 83,000 ! 
verse adds java, tex & publishing-related packages 947 MB 9,000 
geospatial adds geospatial libraries 1.3 GB 4,000 


1This figure includes 49,000 downloads under the earlier name hadleyverse. 
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The debian: stable-based stack 


This stack emphasizes stability and reproducibility of the Docker build. This stack was introduced 
much more recently (November 2016) in response to considerable user input and requests. The key 
feature of this stack is the ability to run older versions of R along with the then-contemporaneous ver- 
sions of R packages. A user specifies the version desired using an image tag, e.g., rocker/r-ver:3.3.1 

will refer to an image with R version 3.3.1 installed. Omitting the tag is equivalent to using the tag 
latest, which, as the name implies, will always point to an image using the current R release. Thus, 
users who want to create downstream Dockerfiles, which are based on the current release at the 
time (but will continue to reconstruct the same environment in the future after newer R versions are 
released), should explicitly include the corresponding version tag, e.g., rocker/r-ver:3.4.2 at the 
time of writing, and not the latest tag. Users can also run the current development version of R using 
the tag devel, which is built nightly from R-devel sources from subversion. 


MRAN archives: To facilitate installation of only contemporaneous versions of R packages on 
these images, the default CRAN mirror from which to install R packages is fixed to a snapshot of CRAN 
corresponding to the last date for which that version of R was current (e.g., 3.4.2 was released on 2017- 
09-28, thus 3.4.1 is pinned to the MRAN snapshot for that date). These snapshots are provided by the 
MRAN archive created by Revolution Analytics (now part of Microsoft). It archives daily snapshots 
of all of CRAN from which a user can install packages with the usual install .packages() function 
(Revolution Analytics, 2017). Users can always override this default by passing any current CRAN 
repository explicitly. Unlike CRAN, Bioconductor only updates its repositories through bi-annual 
releases aligned to R’s spring release schedule. Thus, Bioconductor packages can be installed in the 
usual way using bioclite, which automatically selects the Bioconductor release corresponding to the 
version of R in use. 


Version tags: The version tags are propagated throughout this stack: e.g., rocker/tidyverse: devel 
will provide the currently-released versions of the R packages in the tidyverse (Wickham, 2017) in- 
stalled on the nightly build of R-devel. Developers building packages on this stack are encouraged to 
tag their images accordingly as well. Table 3 indicates which versions of R are currently available in the 
stack, going back to 3.1.0. While older versions may be added to the stack at a later date, we note that 
the MRAN snapshots began in 2014-09-17 and thus go back only to the R 3.1 era. Each tag must be 
built from a separate Dockerfile, enabling minor differences in the build instructions to accommodate 
changing dependencies. Dockerfiles for past versions (e.g., prior to 3.4.2 currently) are intended to 
remain static over the long term, while the tag for the current version, latest, and devel may be 
tweaked to accommodate new features or dependencies. Version tags also obey semantics so that 
omitting the second or third position of the tag is identical to asking for the most recent version: i.e., 
rocker/verse: 3.3 is the same as rocker/verse:3.3.3,and rocker/verse: 3 is (at the time of writing), 
rocker/verse: 3.4.2. This is accomplished using post-build hooks in Docker Hub—see examples at 
https: //github. com/rocker-org/rocker-versioned/ for details. 


Installation: In this stack, the desired version of R is always built directly from source rather than 
the apt repositories. Compilers and dependencies are still installed from the stable apt repositories, 
and thus lag behind the more recent versions found in the testing stack. Version tags 3.3.3 and 
older are based on the Debian 8.0 release, code-named jessie, while 3.4.0- 3.4.2, devel, and latest 
are based on Debian 9.0, stretch, (released 2017-06-17, while R was at 3.4.0), and thus have access 
to much newer versions of common system dependencies and compilers. Dependencies needed to 
compile R that are not required at runtime are removed once R is installed, keeping the base images 
light-weight for faster download times. While most system dependencies required by common R 
packages can still be installed from the apt repositories, occasionally a more recent version must be 
compiled from source (e.g., the Gibbs Sampling program JAGS (Plummer, 2017), and the geospatial 
toolkit GDAL, must both be compiled from source on debian: jessie images). In this stack, users 
should avoid installing R packages using apt without careful consideration as this will install a second 
(probably different) version of R from the Debian repositories, and a dated version of the R package 
since any r-cran-pkgname package in the Debian repositories will depend on r-base in apt as well. 


Build schedule: All images are built automatically from their corresponding Dockerfiles (found 
in the GitHub repositories rocker-org/rocker-versioned and rocker-org/geospatial). A cron job 
sends nightly build triggers to Docker Hub to rebuild the latest and devel tagged images throughout 
the stack. To decrease load on the hub, build triggers for the numeric version tags are sent monthly. 
Although the Dockerfiles for older R versions install an almost-identical software environment every 
time, the monthly rebuilding of these images on Docker Hub ensures they continue to receive Debian 
security updates from upstream, and proves the build recipe still executes successfully. Note that 
rebuilding images with software from external repositories never produces a bit-wise identical image, 
and thus the image identifier hash will change at each build. 
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Images overview 


In this stack, each image builds on the previous image, rather than all other images building directly 
on the base image, as in the testing stack. Table 2 lists the names and descriptions of the five images 
in this stack, along with image size and approximate download counts from Docker Hub. Sizes 
reflect (compressed) cumulative size: a user who has already downloaded the most recent version 
of r-ver and then pulls a copy of rstudio image will only need to download the additional 115 MB 
in the rstudio layers and not the full 334 MB listed. This linear design limits flexibility (no option 
for tidyverse without rstudio) but simplifies use and maintenance. While no single environment 
will be optimal for everyone, both the packages selected in this stack and the stack ordering reflect 
considerable community input and tuning. 


The rstudio image includes a lightweight, easy-to-use and docker-friendly init system, s6 (Bercot, 
2017) for running persistent services, including the RStudio® server. This system provides a convenient 
way for downstream Dockerfile developers to add additional persistent services (such as an ssh server) 
to a single container, or additional start-up or shutdown scripts that should be run when a container 
starts up or shuts down. The rstudio image uses such a start-up script to configure user settings such 
as login password and permissions through environmental variables at run time. 


The tidyverse image contains all required and suggested dependencies of the commonly-used 
tidyverse and devtools R packages, including external database libraries (e.g., MariaDB and Post- 
greSQL). Users should consult the package Dockerfiles or installed. packages() list directly for a 
complete list of installed packages. The verse library adds commonly-used dependencies, notably 
a large but not comprehensive LaTeX environment and Java development libraries. Previously, the 
Rocker project provided the image hadleyverse which has since been divided into tidyverse and 
verse based on community input. 


Table 3: Available tags in the rocker-versioned stack. 


tag apt repos MRAN date Build frequency images with tag 

devel stretch current date nightly r-ver, rstudio, tidyverse, 
verse, geospatial 

latest stretch current date nightly r-ver, rstudio, tidyverse, 
verse, geospatial 

3.4.2 stretch current date monthly r-ver, rstudio, tidyverse, 
verse, geospatial 

3.4.1 stretch 2017-09-28 monthly r-ver, rstudio, tidyverse, 
verse, geospatial 

3.4.0 stretch 2017-06-30 monthly r-ver, rstudio, tidyverse, 
verse, geospatial 

3.3.3 jessie 2017-04-21 monthly r-ver, rstudio, tidyverse, 
verse, geospatial 

3.3.2 jessie 2017-03-06 monthly r-ver, rstudio, tidyverse, 
verse, geospatial 

3.3.1 jessie 2016-10-31 monthly r-ver, rstudio, tidyverse, 
verse, geospatial 

3.3.0 jessie 2016-06-21 monthly r-ver 

3.2.0 jessie 2015-06-18 monthly r-ver 

3.1.0 jessie 2014-09-17 monthly r-ver 


Several images in the rocker-versioned stack can be customized on build when built locally 
(rather than pulling prebuilt images from Docker Hub) by using the --build-arg option of docker 
build. In the r-ver image, users can set R_VERSION and BUILD_DATE (MRAN default snapshot). In the 
rstudio image users can set RSTUDIO_VERSION (otherwise defaults to the most recent version), and the 
PANDOC_TEMPLATES_VERSION . 


This stack also makes use of Docker metadata labels defined by http: //schema-label.org, in- 
dicating image license (GPL-2.0), vcs-url (GitHub repository), and vendor (Rocker Project). These 
metadata can be altered or extended in downstream images. 


Conclusions 


Over the past several years, Docker has seen immense adoption across industry and academia. The 
Open Container initiative (The Linux Foundation: Projects, 2017) now provides an open standard 
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that has further extended this container approach to research environments through projects such as 
Singularity (Lawrence Berkeley National Laboratories, 2017), allowing users to deploy containerized 
environments such as Rocker on machines where they do not have root access, such as clusters or 
private servers. Containerization promises to solve numerous challenges such as portability and 
replicability in research computing, which often relies on complex and heterogeneous software stacks 
(Boettiger, 2015). Yet implementing such environments in containers is not a trivial task, and not all 
implementations provide the same usability, portability or reproducibility. Here we have detailed the 
approach taken by the Rocker project in creating and maintaining these environments through an open 
and community-driven process. This structure of the Rocker project has evolved over three years of 
operation while drawing in an ever-widening base of academic researchers, university instructors and 
industry users. We believe this overview will be instructive not only to users and developers interested 
in the Rocker project, but as a model for similar efforts around other environments or domains. 
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